-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-3900] [UBER] Support log compaction action for MOR tables #5958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
089748b to
290d6fd
Compare
prasannarajaperumal
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me overall. Minor comments.
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java
Show resolved
Hide resolved
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java
Show resolved
Hide resolved
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java
Outdated
Show resolved
Hide resolved
290d6fd to
7202eb3
Compare
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java
Outdated
Show resolved
Hide resolved
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java
Outdated
Show resolved
Hide resolved
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordReader.java
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordReader.java
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieCommandBlock.java
Show resolved
Hide resolved
prasannarajaperumal
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes to AbstractHoodieLogRecordReader needs a more careful review
nsivabalan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am done reviewing the source code changes except AbstractHoodieLogRecordReader. will review AHLogRecordReader in a day or two.
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieWriteClient.java
Show resolved
Hide resolved
...t/hudi-client-common/src/main/java/org/apache/hudi/client/utils/MetadataConversionUtils.java
Show resolved
Hide resolved
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java
Outdated
Show resolved
Hide resolved
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java
Outdated
Show resolved
Hide resolved
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/util/CompactionUtils.java
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/util/CompactionUtils.java
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/view/HoodieTableFileSystemView.java
Show resolved
Hide resolved
| return TimelineDiffResult.UNSAFE_SYNC_RESULT; | ||
| } | ||
|
|
||
| // Rather than checking for compaction instants with completed check and a commit file, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trying to see if this is a mandatory fix. can we leave it as is. trying to not make any changes to core flow.
nsivabalan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still half way through reviewing AHLogRecordReader. but prasanna raised a question somewhere, if we can filter out log blocks whose size is greater than some threshold while doing log compaction. wanted to comment on that w/o miss.
As per existing patch, guess its not doable. It will result in wrong data being served. Bcoz, the compacted log block will take higher precedence when compared to any blocks prior to that.
for eg,
LB_1, LB_2, LB_3 and LB_4(LB_1, LB_3)
LB_1, 2 and 3 are regular data log blocks, where as LB_4 is the compacted one. but it includes only LB_1 and LB_3. LB_2 is ignored due to its larger size let's say.
So, while reading records from this file slice and finding the final snapshot of all records,
we will essentialy read LB_4 and then LB_2.
But if incase we have records that was updated in both LB_1 and LB_2. ideally we want the version in LB_2 as the final one. but if we read just LB_4 and LB_2, LB_4 will win which implicitly means LB_1's version will take precedence.
So, as of now, the design assumes that any log compaction will include every log block preceding it. just wanted to remind you folks as well. Can we take another pass through entire patch to ensure this assumption holds good everywhere. if not, it could result in wrong data being served.
hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordReader.java
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordReader.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordReader.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordReader.java
Outdated
Show resolved
Hide resolved
| String compactedFinalInstantTime = blockTimeToCompactionBlockTimeMap.get(instantTime); | ||
| if (compactedFinalInstantTime == null) { | ||
| // If it is not compacted then add the blocks related to the instant time at the end of the queue and continue. | ||
| instantToBlocksMap.get(instantTime).forEach(block -> currentInstantLogBlocks.addLast(block)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can use instantsBlocks
instantsBlocks.forEach(block -> currentInstantLogBlocks.addLast(block));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regarding L560.
when we have multiple blocks for a given instantTime, shouldn't we be adding those blocks in reverse order.
For eg,
Lets say, for C1 we have LB_1, LB_2, LB_3. and for C2, we have LB_4.
So, instantToBlocksMap will have 2 entries.
when we process C1, value is going to be LB_1, LB-2, LB_3 right? So, we can't do foreach here. we have to add LB_3, followed by LB_2, followed by LB_1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this addressed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed this comment now.
hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordReader.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done w/ my review on AbstractLogRecordReader as well.
I feel the New logic is tad simpler compared to older one and is manageable/comprehensible. Older one had few corner cases here and there. good job on the entire patch @suryaprasanna 👍
hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordReader.java
Show resolved
Hide resolved
|
@prasannarajaperumal : As of now, Surya has completely introduced a new scan() within AbstractLogRecordReader. It was not easy to re-use code from older one and it was getting complicated and hence. If you can take a look and provide any suggestions, would be nice on how to abstract these out. or can we go ahead w/ this duplicated(somewhat) code. |
prasannarajaperumal
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 Thanks @suryaprasanna . Can you please get to the comments from @nsivabalan ? We can land this in soon I hope.
|
@prasannarajaperumal |
|
hey @suryaprasanna : we also wanted to see if its feasible to not introduce a new action, but re-use the ones for compaction? can you check the feasibility on that? |
8305b96 to
75cceac
Compare
prasannarajaperumal
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@suryaprasanna - Thanks for addressing the comments. Can you look at this specific one?
https://github.com/apache/hudi/pull/5958/files#r956601221
Couple of other minor ones looks like are not handled as well.
@nsivabalan - you mentioned that the new Scan api fixes an other unrelated issue as well. That means we roll out USE_LOG_RECORD_READER_SCAN_V2 to true with priority - correct? Dont have to tie this to log compaction rollout. Tracking this here - https://issues.apache.org/jira/browse/HUDI-4940.
Please update the jira link to the actual issue fixed (or please create one). Thanks.
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java
Outdated
Show resolved
Hide resolved
.../test/java/org/apache/hudi/client/functional/DataValidationCheckForLogCompactionActions.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordReader.java
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordReader.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordReader.java
Outdated
Show resolved
Hide resolved
Summary: Make useScanV2 value to read log blocks configurable Reviewers: balajee, O955 Project Hoodie Project Reviewer: Add blocking reviewers, PHID-PROJ-pxfpotkfgkanblb3detq! Reviewed By: balajee, O955 Project Hoodie Project Reviewer: Add blocking reviewers JIRA Issues: HUDI-2398 Differential Revision: https://code.uberinternal.com/D8093885
75cceac to
4590a5c
Compare
|
@prasannarajaperumal and @nsivabalan thanks a lot for reviewing the diff. I have addressed all the comments. |
nsivabalan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed most comments by myself. have 3 to 4 pending to be discussed.
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java
Outdated
Show resolved
Hide resolved
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java
Show resolved
Hide resolved
| final String key = keyIterator.next(); | ||
| HoodieRecord<T> record = (HoodieRecord<T>) recordMap.get(key); | ||
| init(record); | ||
| // For logCompaction operations all the records are read and written as a huge block. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have left a comment elsewhere. I am bit confused here. lets discuss this f2f.
| if (isFileGroupInPendingCompaction(fileGroup)) { | ||
| fileGroup.getAllFileSlices().filter(fs -> !isFileSliceNeededForPendingCompaction(fs) | ||
| && !isFileSliceNeededForPendingLogCompaction(fs)).iterator(); | ||
| if (isFileGroupInPendingCompaction(fileGroup) || isFileGroupInPendingLogCompaction(fileGroup)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I don't see this being addressed?
if (isFileGroupInPendingCompaction(fileGroup)) {
// We have already saved the last version of file-groups for pending compaction Id
keepVersions--;
}
Infact, you removed isFileGroupInPendingLogCompaction from the if condition and missed to fix the other impl.
...t-common/src/main/java/org/apache/hudi/table/action/compact/CompactionExecutionStrategy.java
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieMergedLogRecordScanner.java
Outdated
Show resolved
Hide resolved
.../org/apache/hudi/table/action/compact/plan/generators/BaseHoodieCompactionPlanGenerator.java
Show resolved
Hide resolved
.../org/apache/hudi/table/action/compact/plan/generators/BaseHoodieCompactionPlanGenerator.java
Show resolved
Hide resolved
| String compactedFinalInstantTime = blockTimeToCompactionBlockTimeMap.get(instantTime); | ||
| if (compactedFinalInstantTime == null) { | ||
| // If it is not compacted then add the blocks related to the instant time at the end of the queue and continue. | ||
| instantToBlocksMap.get(instantTime).forEach(block -> currentInstantLogBlocks.addLast(block)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this addressed?
nsivabalan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. good job on the patch buddy! Great job.
What is the purpose of the pull request
This pull request adds support for Log compaction action for MOR table types.
Brief change log
Verify this pull request
This change added tests and can be verified as follows:
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.